Coding Code: Investigating Student’s Data Science Skills with Qualitative Methods
Today’s layout
Investigating student learning through code
What research has been done?
A great deal of research has focused on what to teach in data science courses, but little focus on how students learn data science concepts.
Thus far we have detailed…
concepts or competencies that ought to be included in data science programs
perspectives on when to teach data science
how to teach data science concepts
methods for integrating data science into the classroom
assorted topics to be considered in data science courses
Drawing on research in Computer Science Education
The Importance of Students’ Attention to Program State (Lewis 2012)
Attends to both the code produced by a student and their learning process
Pairs a student’s code with their debugging behavior side-by-side
These analyses of students’ code should not be few and far between. Students’ code poses a unique avenue for qualitative research in the teaching and learning of computing.
A framework for analyzing student’s code (Schulte 2008)
| Text Surface | Program Execution | Function | |
|---|---|---|---|
| Macrostructure | Understanding the overall structure of the program | Understanding the “algorithm” of the program | Understanding the goal / purpose of the program (in its context) |
| Relations | References between blocks, e.g., method calls, object creation | Sequence of method calls, object sequence diagrams | Understanding how sub-goals are related to goals, how function is achieved by subfunctions |
| Blocks | Regions of interest (ROI) that syntactically or semantically build a unit | Operation of a block, a method, or a ROI (as a sequence of statements) | Function of a block, may be seen as a sub-goal |
| Atoms | Language elements | Operation of a statement | Function of a statement, only understandable in context |
How could this look?
Atoms
with(ProximateAnalysisData, plot(PSUA~Lipid, las=1))
Block
anterior <- lm(ProximateAnalysisData$PSUA~ProximateAnalysisData$Lipid)
summary(anterior)
with(ProximateAnalysisData, plot(PSUA~Lipid, las=1))
abline(anterior)
plot(anterior)
Relationships Between Blocks
anterior <- lm(ProximateAnalysisData$PSUA~ProximateAnalysisData$Lipid)
summary(anterior)
with(ProximateAnalysisData, plot(PSUA~Lipid, las=1))
abline(anterior)
plot(anterior)
posterior2 <- lm(ProximateAnalysisDataOutlier$PSUP ~ ProximateAnalysisDataOutlier$Lipid)
summary(posterior2)
with(ProximateAnalysisDataOutlier, plot(PSUP~Lipid, las=1, xlab = "Whole-body Lipid Content (%)", ylab = "UP Fatmeter Reading"))
abline(posterior2)
plot(posterior2)
posterior2
How can this be used for learning trajectory research?
Descriptive coding
“Filters a vector of values using extraction operator, based on an equality relation with a variable selected from dataframe using
$operator”
Uncovering emergent themes
linearAnterior <- lm(PADataNoOutlier$Lipid ~ PADataNoOutlier$PSUA)
early <- subset(RPMA2Growth, StockYear < 2006)
Weight5 <- mean(RPMA2GrowthSub$Weight[RPMA2GrowthSub$Age == 5], na.rm = TRUE)
gas <- gas[!(substr(gas$sampleID,3,3) %in% c("b","c")), ]
obsD <- subset(gas, gas$carboy == "D")$N15_N2_Ar
lowerCIBound <- pMat[1:mlleIndex,1][which.min(abs(mlleCI+likelihoods[1:mlleIndex]))]Data wrangling
Statements of code whose purpose is to prepare a dataset for analysis and / or visualization
Sub-themes
An alternative direction
Practical considerations
How much code should I collect?
How do readers trust my analysis?
Trust comes from:
Why is this important for data science education?
Theobold et al. (2023)
How can we distinguish merely interesting learning from effective learning (Wiggins and McTighe 2005)?
Questions?